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~ The MAILING DATE of this communication appears on the cover sheet with the correspondence address- 
All claims being allowable, PROSECUTION ON THE MERITS IS (OR REMAINS) CLOSED in this application. If not included 
herewith (or previously mailed), a Notice of Allowance (PTOL-85) or other appropriate communication will be mailed in due course. THIS 
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3. □ Copies of the certified copies of the priority documents have been received in this national stage application from the 

International Bureau (PCT Rule 17.2(a)). 
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Applicant has THREE MONTHS FROM THE "MAILING DATE" of this communication to file a reply complying with the requirements 
noted below. Failure to timely comply will result in ABANDONMENT of this application. 
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DETAILED ACTION 
Remarks 

1 . This office action is in response to tine amendment filed on 10/10/2008. 

2. Claim 20 has been cancelled by the Applicants. 

3. Claim 21 has been added 

4. Claims 1 , 6 and 1 2-1 7 have been amended by the Applicants. 

5. Claims 1 , 5, 6, 1 0, 1 2, 1 6, 1 7 and 1 9 are now being further amended by the 
Examiner. 

6. Claims 1-19 and 21 remain pending and now being allowed (re-numbered as 
claims 1-20) 

EXAMINER'S AMENDMENT 

7. An examiner's amendment to the record appears below. Should the changes 
and/or additions be unacceptable to applicant, an amendment may be filed as 
provided by 37 CFR 1.312. To ensure consideration of such an amendment, it 
MUST be submitted no later than the payment of the issue fee. 

8. Authorization for this examiner's amendment was given in a telephone interview 
with Frederick E. Cooperider (Reg# 36,769) on 12/23/2008 and 12/29/2008 to 
obviate any potential 35 U.S.C. § 112 issues, and to put the claims in condition 
for allowance. 

9. The application has been amended as follows: 
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IN THE CLAIMS 

Please amend claims 1, 5, 6, 10, 12, 16, 17, and 19, as followings: 
Claim 1 (Currently amended): 

A software method of improving at least one of efficiency and speed 
in executing a linear algebra subroutine on a computer having a floating point 
unit (FPU) with floating point registers (FRegs) and a load/store unit (LSU) 
capable of overlapping loading data and processing said data by the FPU, said 
FPU being interfaced with an LI (Level 1 ) cache and having an LI cache/FReg 
interface "loading penalty of n cycles", n being an integer greater than or equal to 
1 , during which data is rearranged in up to n cycles in said FRegs because data 
arrives out of order for said processing, said method comprising: 

loading matrix data from a memory through a cache system at a fastest 
possible rate; and 

then either immediately or at a later time, for an execution code controlling 
operation of said linear algebra subroutine execution, overlapping by preloading 
data into said FRegs of said FPU and then rearranging the data in said FRegs for 

up to said n cycles, said overlapping causing said matrix data to arrive into said 
FRegs from said LI cache to be timely executed by the FPU operations of said 
linear algebra subroutine on said FPU. 



Claim 5 (Currently amended): 
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The method of claim 4, wherein said LAPCK subroutine comprises a Level 
3 Basic Linear Algebra Subprograms (BLAS) which includes L1 cache kernel 
routines BLAS L e v el 3 LI cach e k e rn el . 



Claim 6 (Currently amended): 

An apparatus, comprising: 

a memory to store matrix data to be used for processing in a linear 
algebra program; 

an LI (Level 1 ) cache to receive data from said memory; 

a floating point unit (FPU) to perform said processing; and 

a load/store unit (LSU) to load data to be processed by said FPU, said 
LSU loading said data into a plurality of floating point registers (FRegs), wherein 
said data processing overlaps said data loading such that matrix data is 
preloaded into said FRegs from said LI cache prior to being required by said 
FPU and the preloaded data in said FRegs is rearranged for up to n cycles, n 
being an integer greater than or equal to 1 . 



Claim 10 (Currently amended): 

The apparatus of claim 9, wherein said subroutine comprises a Level 3 
Basic Linear Algebra Subprograms (BLAS) which includes LI cache kernel 



routines BLAS Leve l 3 LI cach e k e rn el. 
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Claim 12 (Currently amended): 

A computer-readable storage medium tangibly embodying a program of 
machine-readable instructions executable by a digital processing apparatus to 
perform a method of improving at least one of speed and efficiency in executing 
a linear algebra subroutine on a computer having a floating point unit (FPU) and 
a load/store unit (LSU) capable of overlapping loading data and processing said 
data, said method comprising: 

for an execution code controlling operation of said linear algebra 
subroutine execution, overlapping by preloading data into a floating point 
registers (FRegs) of said FPU and rearranging the preloaded data in said FRegs 
for up to n cycles, where n is an integer greater than or equal to 1 , said 
overlapping causing data from an LI (Level 1 ) cache to arrive into said FRegs, to 
be timely executed by FPU operations of said linear algebra subroutine on said 
FPU in view of said up to n cycles used for rearranging said preloaded data. 

Claim 16 (Currently amended): 

The system of claim 15, wherein said subroutine comprises a Level 3 
Basic Linear Algebra Subprograms (BLAS) which includes LI cache kernel 
routines BLAS L e v el 3 LI cach e k e rn el . 



Claim 17 (Currently amended): 
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A method of providing a service involving at least one of solving 
and applying a scientific/engineering problem, said method comprising at l oQot 

using a linear algebra software package that computes one or more matrix 
subroutines, wherein said linear algebra software package generates an 
execution code controlling a load/store unit loading data into a floating point 
registers (FRegs) for a floating point unit (FPU) performing a linear algebra 
subroutine execution, said FPU capable of overlapping loading data and 
performing said linear algebra subroutine processing, such that, for an execution 
code controlling operation of said FPU, said overlapping causes a preloading 
of data from an LI (Level 1 ) cache into said FRegs and then rearranges said 
preloaded data for up to n cycles, n being an integer greater than or equal to 1 , 
and wherein a stride one data transfer is used for providing said data for said 
preloading for all operands without using a data copy processing for correcting 
said stride one data transfer for any operand of said linear algebra subroutine; 

providing a consultation for purpose of solving a scientific/engineering 
problem using said linear algebra software package; 

transmitting a result of said linear algebra software package on at least 
one of a network, a signal-bearing medium containing machine-readable data 
representing said result, and a printed version representing said result; and 
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receiving a result of said linear algebra software package on at least one 
of a network, a signal-bearing medium containing machine-readable data 
representing said result, and a printed version representing said result. 

Claim 19 (Currently amended): 

The method of claim 18, wherein said LAPCK subroutine comprises a 
Level 3 Basic Linear Algebra Subprograms (BLAS) which includes LI cache 
kernel routines BLAS Lovo l 3 LI cacho korno l . 

--END OF AMENDMENT— 

Allowable Subject Matter 

Claims 1-19 and 21 are allowed. As Applicants point out in the remarks 
filed on 10/10/2008, the closest cited prior art of Nakazawa (US 5,438,669) 
and/or Dhablania (US 6,1 15,730), and/or Mulla (US 6,507,892) fails to teach or 
fairly suggest the method and the system for improving efficiency and speed 
in executing a linear algebra subroutine on a computer having a floating point 
unit (FPU) with floating point registers (FRegs) and a load/store unit (LSU) 
capable of overlapping loading data and processing said data by the FPU, said 
FPU being interfaced with an LI (Level 1 ) cache and having an LI cache/FReg 
interface "loading penalty of n cycles", n being an integer greater than or equal to 
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1, during which data is rearranged in up to n cycles in said FRegs because data 
arrives out of order for said processing, said method comprising: loading matrix 
data from a memory through a cache system at a fastest possible rate; and then 
either immediately or at a later time, for an execution code controlling operation 
of said linear algebra subroutine execution, overlapping by preloading data into 
said FRegs of said FPU and then rearranging the data in said FRegs for up to 
said n cycles, said overlapping causing said matrix data to arrive into said FRegs 
from said LI cache to be timely executed by the FPU operations of said linear 
algebra subroutine on said FPU as recited in claims 1,6, 12, and 17. In as such 
manners independent claims 1, 6, 12, 17 and each of the dependent claims are 
allowable for at least the same reasons. 
1 0. Any comments considered necessary by applicant must be submitted no later 
than the payment of the issue fee and, to avoid processing delays, should 
preferably accompany the issue fee. Such submissions should be clearly labeled 
"Comments on Statement of Reasons for Allowance." 

Conclusion 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Zheng Wei whose telephone number is (571) 270-1059 
and Fax number is (571) 270-02059. The examiner can normally be reached on 
Monday-Thursday 8:00-15:00. 
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If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Tuan Q. Dam can be reached on (571) 272-3695. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 

Any inquiry of a general nature of relating to the status of this application or 
proceeding should be directed to the TC 2100 Group receptionist whose telephone 
number is 571-272-1000. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated Information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 



/Z. W./ 

Examiner, Art Unit 2192 



/Tuan Q. Dam/ 

Supervisory Patent Examiner, Art Unit 2192 



